809 research outputs found

    Generation of Policy-Level Explanations for Reinforcement Learning

    Full text link
    Though reinforcement learning has greatly benefited from the incorporation of neural networks, the inability to verify the correctness of such systems limits their use. Current work in explainable deep learning focuses on explaining only a single decision in terms of input features, making it unsuitable for explaining a sequence of decisions. To address this need, we introduce Abstracted Policy Graphs, which are Markov chains of abstract states. This representation concisely summarizes a policy so that individual decisions can be explained in the context of expected future transitions. Additionally, we propose a method to generate these Abstracted Policy Graphs for deterministic policies given a learned value function and a set of observed transitions, potentially off-policy transitions used during training. Since no restrictions are placed on how the value function is generated, our method is compatible with many existing reinforcement learning methods. We prove that the worst-case time complexity of our method is quadratic in the number of features and linear in the number of provided transitions, O(∣F∣2∣tr_samples∣)O(|F|^2 |tr\_samples|). By applying our method to a family of domains, we show that our method scales well in practice and produces Abstracted Policy Graphs which reliably capture relationships within these domains.Comment: Accepted to Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (2019

    Language-based sensing descriptors for robot object grounding

    Get PDF
    In this work, we consider an autonomous robot that is required to understand commands given by a human through natural language. Specifically, we assume that this robot is provided with an internal representation of the environment. However, such a representation is unknown to the user. In this context, we address the problem of allowing a human to understand the robot internal representation through dialog. To this end, we introduce the concept of sensing descriptors. Such representations are used by the robot to recognize unknown object properties in the given commands and warn the user about them. Additionally, we show how these properties can be learned over time by leveraging past interactions in order to enhance the grounding capabilities of the robot

    Graph-based task libraries for robots: generalization and autocompletion

    Get PDF
    In this paper, we consider an autonomous robot that persists over time performing tasks and the problem of providing one additional task to the robot's task library. We present an approach to generalize tasks, represented as parameterized graphs with sequences, conditionals, and looping constructs of sensing and actuation primitives. Our approach performs graph-structure task generalization, while maintaining task ex- ecutability and parameter value distributions. We present an algorithm that, given the initial steps of a new task, proposes an autocompletion based on a recognized past similar task. Our generalization and auto- completion contributions are eective on dierent real robots. We show concrete examples of the robot primitives and task graphs, as well as results, with Baxter. In experiments with multiple tasks, we show a sig- nicant reduction in the number of new task steps to be provided

    Automated Formula Generation and Performance Learning for the FFT

    Get PDF
    A single signal processing algorithm can be represented by many different but mathematically equivalent formulas. When these formulas are implemented in actual code, they often have very different running times. Thus, an important problem is finding a formula that implements the signal processing algorithm as efficiently as possible. In this paper we present three major results toward this goal: (1) Different but mathematically equivalent formulas can be generated automatically in a principled way, (2) Simple features describing formulas can be used to distinguish formulas with significantly different running times, and (3) A function approximator can learn to accurately predict the running time of a formula given a limited set of training data

    Effective Multi-Model Motion Tracking Under Multiple Team Member Actuators

    Get PDF
    • …
    corecore